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Abstract — We study the trade-off between delivery delay and 
energy consumption in a delay tolerant network in which a 
message (or a file) has to be delivered to each of several 
destinations by epidemic relaying. In addition to the destinations, 
there are several other nodes in the network that can assist in 
relaying the message. We first assume that, at every instant, all 
the nodes know the number of relays carrying the packet and 
the number of destinations that have received the packet. We 
formulate the problem as a controlled continuous time Markov 
chain and derive the optimal closed loop control (i.e., forwarding 
policy). However, in practice, the intermittent connectivity in the 
network implies that the nodes may not have the required perfect 
knowledge of the system state. To address this issue, we obtain an 
ODE (i.e., a deterministic fluid) approximation for the optimally 
controUed Markov chain. This fluid approximation also yields 
an asymptotically optimal open loop policy. Finally, we evaluate 
the performance of the deterministic policy over finite networks. 
Numerical results show that this policy performs close to the 
optimal closed loop policy. 



I. Introduction 

Delay tolerant networks (DTNs) [l] are sparse wireless ad 
hoc networks with highly mobile nodes. In these networks, 
the link between any two nodes is up when these are within 
each other's transmission range, and is down otherwise. In 
particular, at any given time, it is unlikely that there is a 
complete route between a source and its destination. 

We consider a DTN in which a short message (also referred 
to as a packet) needs to be delivered to multiple (say M) 
destinations. There are also N potential relays that do not 
themselves "want" the message but can assist in relaying 
it to the nodes that do. At time t = 0, Nq of the relays 
have copies of the packet. All nodes are assumed to be 
mobile. In such a network, a common technique to improve 
packet delivery delay is epidemic relaying [2J. We consider a 
controlled relaying scheme that works as follows. Whenever a 
node (relay or destination) carrying the packet meets a relay 
that does not have a copy of the packet, then the former has 
the option of either copying or not copying. When a node that 
has the packet meets a destination that does not, the packet 
can be delivered. 
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We want to minimize the delay until a significant fraction 
(say a) of the destinations receive the packet; we refer to this 
duration as delivery delay. Evidently, delivery delay can be 
reduced if the number of carriers of the packet is increased 
by copying it to relays. Such copying can not be done 
indiscriminately, however, as every act of copying between 
two nodes incurs a transmission cost. Thus, we focus on the 
problem of the control of packet forwarding. 
Related work: Analysis and control of DTNs with a single- 
source and a single-destination has been widely studied. 
Groenevelt et al. ||3j modeled epidemic relaying and two- 
hop relaying using Markov chains. They derived the average 
delay and the number of copies generated until the time of 
delivery. Zhang et al. |4| developed a unified framework based 
on ordinary differential equations (ODEs) to study epidemic 
routing and its variants. 

Neglia and Zhang ||51 were the first to study the optimal 
control of relaying in DTNs with a single destination and 
multiple relays. They assumed that all the nodes have perfect 
knowledge of the number of nodes carrying the packet. Their 
optimal closed loop control is a threshold policy - when a relay 
that does not have a copy of the packet is met, the packet 
is copied if and only if the number of relays carrying the 
packet is below a threshold. Due to the assumption of complete 
knowledge, the reported performance is a lower bound for the 
cost in a real system. 

Altman et al. (6) addressed the optimal relaying problem for 
a class of monotone relay strategies which includes epidemic 
relaying and two-hop relaying. In particular, they derived static 
and dynamic relaying policies. Altman et al. |7| considered 
optimal discrete-time two-hop relaying. They also employed 
stochastic approximation to facilitate online estimation of 
network parameters. In another paper, Altman et al. fSl consid- 
ered a scenario where active nodes in the network continuously 
spend energy while beaconing. Their paper studied the joint 
problem of node activation and transmission power control. 
These works (||6|, jT), El) heuristically obtain fluid approxi- 
mations for DTNs and study open loop controls. Li et al. @ 
considered several families of open loop controls and obtain 
optimal controls within each family. 

Deterministic fluid models expressed as ordinary differential 
equations have been used to approximate large Markovian 
systems. Kurtz |10| obtained sufficient conditions for the con- 
vergence of Markov chains to such fluid limits. Darling IITTIl 
and subsequently. Darling and Norris |12 | generalized Kurtz's 
results. Darling ifTTI considers the scenario when the Marko- 
vian system satisfies the conditions in ifTOl only over a subset. 
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He shows that the scaled processes converge to a fluid limit 
until they exit from this subset. Darling and Norris fT2l| 
generalize the conditions for convergence, e.g., uniform con- 
vergence of the mean drifts of Markov chains and Lipschitz 
continuity of the limiting drift function, prescribed in |10|. 
Gast and Gaujal [131 address the scenario where the limiting 
drift functions are not Lipschitz continuous. They prove that 
under mild conditions, the stochastic system converges to the 
solution of a differential inclusion. Gast et al. L14J study 
an optimization problem on a large Markovian system. They 
show that solving the limiting deterministic problem yields an 
asymptotically optimal policy for the original problem. 

Our Contributions: We formulate the problem as a con- 
trolled continuous time Markov chain (CTMC) fTSl, and 



obtain the optimal policy (Section IIIi. The optimal policy 
relies on complete knowledge of the network state at every 
node, but availability of such information is constrained by 
the same connectivity problem that limits packet delivery. 
In the incomplete information setting, the decisions of the 
nodes would have to depend upon their beliefs about the 
network state. The nodes would need to update their beliefs 
continuously with time, and also after each meeting with 
another node. Such belief updates would involve maintaining 
a complex information structure and are often impractical 
for nodes with limited memory and computation capability. 
Moreover, designing closed loop controls based on beliefs is 
a difficult task lll6l . even more so in our context with multiple 
decision makers and all of them equipped with distinct partial 
information. 

In view of the above difficulties, we adopt the following 
approach. We show that when the number of nodes is large, 
the optimally controlled network evolution is well approx- 



imated by a deterministic dynamical system (Section IV i. 
The existing differential equation approximation results for 
Markovian systems ifTOll . ifTTI do not directly apply, as, in the 
optimally controlled Markov chain that arises in our problem, 
the mean drift rates are discontinuous and do not converge 
uniformly. We extend the results to our problem setting in 
our Theorem 14.11 in Section |IV] Note that the differential 
inclusion based approach of Gast and Gaujal |13| is not 
directly applicable in our case, as it needs uniform convergence 
of the mean drift rates. The limiting deterministic dynamics 
then suggests a deterministic control that is asymptotically 
optimal for the finite network problem, i.e., the cost incurred 
by the deterministic control approaches the optimal cost as the 
network size grows. We briefly consider the analogous control 
of two-hop forwarding |17| in Section |V] Our numerical 
results illustrate that the deterministic policy performs close 
to the complete information optimal closed loop policy for a 



wide range of parameter values (Section VI i. 



In a nutshell, the ODE approach is quite common in the 
modeling of such problems. Its validity in situations without 
control is established by Kurtz ||10| , Darling and Norris |12J, 
etc. We aim in this paper at rigorously showing the validity 
of this limit under control in a few DTN problems. 



II. The System Model 

We consider a set of K -.^ M + N mobile nodes. These 
include M destinations and N relays. At < = 0, a packet is 
generated and immediately copied to Nq relays (e.g., via a 
broadcast from an infrastructure network). Alternatively, these 
A'o nodes can be thought of as source nodes. 

7 ) Mobility model: We model the point process of the meet- 
ing instants between pairs of nodes as independent Poisson 
point processes, each with rate A. Groenevelt et al. |3 1 validate 
this model for a number of common mobility models (random 
walker, random direction, random waypoint). In particular, 
they establish its accuracy under the assumptions of small 
communication range and sufficiently high speed of nodes. 

2) Communication model: Two nodes may communicate 
only when they come within transmission range of each other, 
i.e., at meeting instants. The transmissions are assumed to be 
instantaneous. We assume that that each transmission of the 
packet incurs unit energy expenditure at the transmitter. 

3) Relaying model: We assume that a controlled epidemic 
relay protocol is employed. 

Throughout, we use the terminology relating to the spread 
of infectious diseases. A node with a copy of the packet is said 
to be infected. A node is said to be susceptible until it receives 
a copy of the packet from another infected node. Thus at t = 0, 
iVo nodes are infected while Al + N — Nq are susceptible. 

A. The Forwarding Problem 

The packet has to be disseminated to all the M destinations. 
However, the goal is to minimize the duration until a fraction 
a (a <\) of the destinations receive the packet. 

At each meeting epoch with a susceptible relay, an infected 
node (relay or destination) has to decide whether to copy the 
packet to the susceptible relay or not. Copying the packet 
incurs unit cost, but promotes early delivery of the packet to 
the destinations. We wish to find the trade-off between these 
costs by minimizing 



(1) 



where Td is the time until which at least := \aM~\ 
destinations receive the packet, £c is the total energy con- 
sumed in copying, and 7 is the parameter that relates energy 
consumption cost to delay cost. Varying 7 helps studying the 
trade-off between the delay and the energy costs. 

III. Optimal Epidemic Forwarding 

We derive the optimal forwarding policy under the as- 
sumption that, at any instant of time, all the nodes have full 
information about the number of relays carrying the packet 
and the number of destinations that have received the packet. 
This assumption will be relaxed in the next section. 

A. The MDP Formulation 

Let tk^k — 1,2,... denote the meeting epochs of the 
infected nodes (relays or destinations) with the susceptible 
nodes. Let Iq :— and define 5k := tfc — ife-i for fc > 1. 
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Fig. 1. Evolution of tlie controlled Markov chain {sfc}. Note that (mfe, n^) 
is embedded at t^.—, i.e., just before the meeting epoch. 



Let m{t) and n{t) be the numbers of infected destinations 
and relays, respectively, at time t. In particular, m(0) — 
and n(0) = A'o, and the forwarding process stops at time t if 
m{t) = M. We use and Uk to mean M{tk—) and N{tk—) 
which are the numbers of infected destinations and relays, 
respectively, just before the meeting epoch tk- Let e^. describe 
the type of the susceptible node that an infected node meets 
at tk', ek & £ := {d,r} where d and r stand for destination 
and relay, respectively. The state of the system at a meeting 
epoch tk is given by the tuple 

Sk ■= {mk,nk,ek). 

Since the forwarding process stops at time t if m{t) ~ M, the 
state space is [M - 1] x [Nq : N] x £^ 

Let Uk be the action of the infected node at meeting epoch 
tk,k = 1,2,.... The control space is U ^ {0, 1}, where 1 is 
for copy and is for do not copy. The embedding convention 
described above is shown in Figure [T] 

We treat the tuple {Sk+i, ek+i) as the random disturbance 
at epoch tk- Note that for fc = 1, 2, . . . , the time between suc- 
cessive decision epochs, Sk, is independent and exponentially 
distributed with parameter (m^ + nk){M + N — ruk — nk)\- 
Furthermore, with "w.p." standing for "with probability", we 
have 



d V/.p. Pm.^nAd) 

r w.p. p„,^„,(r) 



M+N-nik-Tik ' 

N-Uk 
M+N-mk-rik ' 



1 ) Transition structure: From the description of the system 
model, the state at time /c + 1 is given by ,Sfe+i = {nik + 
Uk,nk,ek+i) if ek = d, and Sk+i = {mk,nk + Uk,ek+i) 
if Cfc = r. Recall that ek+i is a component in the random 
disturbance. Thus the next state is a function of the current 
state, the current action and the current disturbance as required 
for an MDP . 

2) Cost Structure: For a state-action pair {sk,Uk) the 
expected single stage cost is given by 

g{Sk, Uk) = JUk + E {Sk+ll{mk + i<Mc}} ' 



'We use notation [a] = {0, 1, . . . , a} and [a : b] = {a, a + 1, . . . ,b} for 
I > a + 1 and a, 6 £ Z_|_. 



where the expectation is taken with respect to the random 
disturbance {6k+i,ek+i)- It can be observed that 



jUk if Sk is such that irik > Ma 
'J if Sk — {Ma — l,n,d) and Uk — 1 
7Ufc + C'disk,Uk) otherwise, 



1 



gisk,uk) 

where 

Cd{,Sk,Uk) - , , \/n,r , AT W 

(nik + Uk + Uk)[M + N - mk - Uk - Uk)\ 

is the mean time until the next decision epoch. The quantity 
7 is expended whenever Uk = 1, i.e., the action is to copy. 

3) Policies: A policy tt is a sequence of mappings {u^, k = 
0, 1, 2, . . . }, where : [M - I] x [Nq : N] x £ ^ U. The 
cost of an admissible policy tt for an initial state s = (to, n, e) 
is 



•h{s) 



E 



fe=0 



{_9{sk,ul{sk))\so = s|. 



Let n be the set of all admissible policies. Then the optimal 
cost function is defined as 

J(s) = mill J7r(s). 

A policy TT is called stationary if u1 are identical, say u, for 
all k. For brevity we refer to such a policy as the stationary 
policy u. A stationary policy u* = {u*, w*, . . . } is optimal if 
Ju* (s) — J{s) for all states s. 

4) Total Cost: We now translate the optimal cost-to-go from 
the first meeting instant into optimal total cost. Recall that 
at the first decision instant ti, the state si is {0,NQ,r) or 
{0,No,d) depending on whether the susceptible node that is 
met is a relay or a destination. The objective function ([TJ can 
then be restated as 



EATd + i£c} = 



1 



N- N, 







\Nq{M + N -No) 
MO, No,r) + ^ MO, No, d) 



M + N- No 



(2) 



where the subscript tt shows dependence on the underlying 
policy. In the right hand side, the first term xpf^^M+N-Na) 
the average delay until the first decision instant which has to 
be borne under any policy. 



B. Optimal Policy 

Since the cost function g{-) is nonnegative. Proposition LI 
in fTS^ Chapter 3] implies that the optimal cost function will 
satisfy the following Bellman equation. For s = (m,n,e). 



As) 



min A(s,u) 
ue{o,i} 



where A{s, u) = g{s, u) +E ( J(s')|s, u) . 

Here s' denotes the next state which depends on s,u and the 
random disturbance in accordance with the transition structure 
described above. The expectation is taken with respect to the 
random disturbance. Furthermore, since the action space is 
finite, there exists a stationary optimal policy u* such that, for 
all s, u*{s) attains minimum in the above Bellman equation 
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(see ifTSl Chapter 3]). In the following we characterize this 
stationary optimal policy. 

First, observe that it is always optimal to copy to a destina- 
tion, that is, the optimal policy satisfies u* {m, n,d) = 1 for 
all (m, n) S [M — 1] x [Nq : N]. Moreover, once a fraction a 
of the destinations have obtained the packet, no further delay 
cost is incurred, and so further copying to relays does not help: 
u* (m, r) = for all (to, n) e [M^ : M - 1] x [iVo : N]. 

Next, focus on a reduced state space [Ma — 1] x [Nq : 
N] X {r}. Consider the following one step look ahead 
policy ifTSl Section 3.4]. At a meeting with a susceptible relay, 
say when the state is {m,n,r), compare the following two 
action sequences. 

1) Os: stop, i.e., do not copy to this relay or to any suscep- 
tible relays met in the future, 

2) Is: copy to this relay and then stop. 

The costs to go corresponding to the action sequences Os and 
Is are, respectively. 



50 



1 



Josim,n,r) ^ (M -m)-/+ V , ^ and 



j=m 



Jis(m, n, r) = (M — TO + 1)7 + ^ 



1 



X{n + j + l)iM-j)- 



The stopping set Ss is defined to be 

Ss :— {{171,71, r) : $(TO,n) < 0} 

where 

<i>(m, n) :— Jos(to, n, r) — Jis{m, n, r) 



(3) 



E 



1 

X{n + j){n + j + l){M-j) 



-1 (4) 



for all (to, n) S [M^ — 1] x [Nq : N]. The one step look ahead 
policy is to copy to relay when {m,n,r) ^ Ss, and to stop 
copying otherwise]^ 

One step look ahead policies have been shown to be optimal 
for stopping problems under certain conditions (see |18 Sec- 
tion 4.4] and 1,15, Section 3.4]). Let us reemphasize that our 
problem is not a stopping problem because an action now is 
not equivalent to stop as the resulting state is not a terminal 
state; a susceptible relay that is met in the future may be 
copied even if the one met now is not. However, we exploit 
the cost structure to prove that when an infected node meets a 
susceptible relay, it can restrict attention to two actions: 1 (i.e., 
copy now) and stop (i.e., do not copy now and never copy 
again). Subsequently, we also show that the above one step 
look ahead policy (see ([3]l) is optimal. 

Theorem 3.1: The optimal policy u* : [M — 1] x [Nq : 



^We use the standard convention that a sum over an empty index set is 
0. Thus ^(m.n) = —7 if m > M^. Consequently, for the states [Ma : 
M — 1] X [A^o : N] X {r}, one step-look ahead policy prescribes stop. This 
is consistent with our earlier discussion. 
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Fig. 2. An illustration of the optimal policy. The symbols 'X' mark the 
states in which the optimal action (at meeting with a relay) is to copy 



N] X £ satisfies 

{1, if e ^ d, 
1, if e = r and $(TO,n) > 0, 
stop if e = r and $(771, n) < 0. 

Proof: Though the optimal policy is a simple stopping 
policy, the proof of its optimality is far from obvious. See 
Appendix |A[ ■ 
We illustrate the optimal policy using an example. Let 
M = 15,N ^ 50, Nq = 10, a = 0.8, A = 0.001 and 
7=1. The "x" in Figure |2] are the states where the optimal 
action (at meeting with a relay) is to copy. For example, if only 
5 destinations have the packet, then relays are copied to if and 
only if there are 24 or less infected relays. If 7 destinations 
already have the packet and there are 19 infected relays, then 
no further copying to relays is done. 



IV. Asymptotically Optimal Epidemic Forwarding 

In states [Ma — 1] x [Nq : N] x {r}, the optimal action, 
which is governed by the function <1>(to, n), requires perfect 
knowledge of the network state (to, n). This may not be avail- 
able to the decision maker due to intermittent connectivity. In 
this section, we derive an asymptotically optimal policy that 
does not require knowledge of network's state but depends 
only on the time elapsed since the generation of the packet. 
Such a policy is implementable if the packet is time-stamped 
when generated and the nodes' clocks are synchronized. 



A. Asymptotic Deterministic Dynamics 

Our analysis closely follows Darling ifTTI . It is straightfor- 
ward to show that the equations that follow are the conditional 
expected drift rates of the optimally controlled CTMC. For 

{m{t),n{t)) e [M — 1] X [Nq : N], using the optimal poHcy 
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in Theorem |3.1[ we get 

dE{m{t)\{m{t),n{t))) 
dt 

dE(n(t)|(m(<),n(i))) 
dt 



= A(m(<) + n{t)){M - m{t)), 



(5a) 



X{m{t) + n{t)){N - n{t)) 

l{*(m(t),n(t))>0}- (5b) 



Recalling that K = AI + N, the total number of nodes, 
we study large K asymptotics. Towards this, we consider a 
sequence of problems indexed by K. The parameters of the 
Kth problem are denoted using the superscript K. Normalized 
versions of these parameters, and normalized versions of the 
system state are denoted as follows: 

K ' K ' 



A 



K 



n{t) 



(6) 



.^(i) = ^ and = ^ 



Remarks 4.1: The pairwise meeting rate and the copying 
cost must both scale down as K increases. Otherwise, the 
delivery delay will be negligible and the total transmission cost 
will be enormous for any policy, and no meaningful analysis 
is possible. 

For each K, we define scaled two-dimensional integer 
lattice 



K 



{x^{t),y^{t)) e A^'. Also, for {x^ (t) , {t)) G A^, using 
the notation in (|6|, the drift rates in (|5a]i-(|5b]l can be rewritten 
as follows. 

dt 

= /f(x^(t),y^(t)) 
:^A{x^{t) + y^{t)){X-x^it)), (7a) 
dE{y^mx'<{t),y^m 
dt 

= f^ix^it),y'<it)) 

:= A(a;^(t) + y^W)(y - 2/^ W)l{0-(.-(t),„-(t))>o} 
where, for {x, y) € A^, 



j=Kx 



(7b) 



-r. 



(8) 



We also define {x{t),y{t)) e [0,^] x [Yo,Y] as functions 
satisfying the following ODEs: x{Q) = 0,y(0) = Yq, and for 



t > 0, 
dx{t) 

dt 
dyjt) 
dt 

wher^ 



fii<t),y{t)) 
f2{x{t),y{t)) 



K{x{t) + y{t)){X-x{t)), 
A{x{t)+ymY^y{t)) 

{^ix{t),v{t))>0} 



1 



dz 



(9a) 



(9b) 



(10) 



A(y + z)2(X-z) 
Finally, we redefine the delivery delay Td (see ([T]i) to be 

= M{t>0:x^{t)>Xa}, (11) 
and r = mi{t > : x{t) > X^}- (12) 

Note that is a stopping time for the random process 
{x^ {t),y^ (t)), whereas t is a deterministic time instant. 
Since f{^{x,y) is bounded away from zero, < oo with 
probability 1. Similarly, on account of fi{x, y) being bounded 
away from zero, t < oo. 

Kurtz [lOJ and Darling ifTTl studied convergence of CTMCs 
to the solutions of ODEs. The following are the hypotheses for 
the version of the limit theorem that appears in Darling [TT]. 

(i) lim;^^o,P(||(x^(0),y^'(0)-(x(0),y(0))|| >e) =0; 

(ii) In the scaled process {x^ {t),y^ (t), the jump rates are 
0{K) and drifts are 0{K-^y, 

(iii) {f{^{x,y)jf{x,y)) converges to {fi{x,y), f2ix,y)) 
uniformly in {x,y); 

(iv) {fi{x,y), f2{x,y)) is Lipschitz continuous. 

Observe that, in our case, only the first two hypotheses are 
satisfied. In particular, fi^{x,y) does not converge uniformly 
to f2{x, y), and f2{x, y) is not Lipschitz over [0, Xa\ x [Iq, Y]. 
Hence, the convergence results do not directly apply in our 
context. Thankfully, there is some regularity we can exploit 
which we now summarize as easily checkable facts. 

(a) (f>^{x,y) converges uniformly to ip{x,y); 

(b) the drift rates fi{x,y) and f2{x,y) are bounded from 
below and above; 

(c) fi{x, y) is Lipschitz and f2{x, y) is locally Lipschitz; and 

(d) for all small enough e M, and all (x, y) on the graph of 
"0(a;, y) = v", the direction in which the ODE progresses, 

y)j2{x, y)), is not tangent to the graph. 
We then prove the following result which is identical to ifTTl 
Theorem 2.8]. 

Theorem 4.1: Assume that a < 1 and > 0. Then, for 
every e, (5 > 0, 



lim P sup ||(a;^(t),2/^(t) 



(x(t),y(i))|| >6 =0, 



lim P(|- 



.K 



K—^co 



-\>5)^ 0. 



Proof: See Appendix [B] ■ 
We illustrate Theorem |4. 1| using an example. Let X = 
0.2, r = 0.8, a = 0.8, yo = 0.2, A = 0.05 and T = 50. 
In Figure |3] we plot {x{t),y{t)) and sample trajectories of 

^We use the convention that an integral assumes the value if its lower 
limit exceeds the upper limit. So, (/>(a;, y) = — F if x > Xa- 
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Fig. 3. Simulation results: The top and bottom sub-plots respectively 
show the fractions of infected destinations and relays as a function of time. 
(x^ (t), (t)) are obtained from a simulation of the controlled CTMC, and 
{x{t),y{t)) from the ODEs. The marker 'X' indicates the states at which 
copying to relays is stopped whereas 'O' indicates the states at which a 
fraction a of destinations have the packet. 



{x^{t),y^{t)) for K = 100,200 and 500. We indicate the 
states at which the optimal pohcy stops copying t o re lays 
i.e., {x^ {t) , {t}) goes below (see Theorem 



3.1 



and 

the states at which the fraction of infected destinations crosses 
Xa- We also show the corresponding states in the fluid model. 
The plots show that for large K, the fluid model captures the 
random dynamics of the network very well. 

B. Asymptotically Optimal Policy 

Observe that (j){x,y) is decreasing in x and y, both of 
which are nondecreasing with t. Consequently (f){x{t),y{t)) 
decreases with t. We define 



r* := inf{t > : (t){x{t), y{t)) < 0}. 



(13) 



The limiting deterministic dynamics suggests the following 
policy for the original forwarding problem]^ 

1 if e = d, 
(to, n, e) = 1 if e = r and i < t* , 
if e = r and t > t* . 

We show that the policy u°° is asymptotically optimal in the 
sense that its expected cost approaches the expected cost of 

"^Observe that the policy u°° does not require knowledge of m and n. The 
infected node readily knows the type of the susceptible node (d or r) at the 
decision epoch. 



the optimal policy u* as the network grows. Let us restate ^ 
as 

i — ro 

We have used superscript K to show the dependence of 
cost on the network size. We then establish the following 
asymptotically optimaUty result. 
Theorem 4.2: 

lim Ef. {Td + 7^ c} = lim Ef^ {Td + 7^c} = r + Ty (r* ) . 

/V— >-C30 A— voo 

Proof: See Appendix [C] ■ 
Remarks 4.2: Observe that we do not compare the hmiting 
value of the optimal costs with the optimal cost on the (lim- 
iting) deterministic system. In general, these two may differ]^ 
However, the deterministic policy u°° can be applied on the 
finite ii'-node system. The content of the above theorem is 
that given any e > 0, cost of the policy is within e of the 
optimal cost on the K-node system for all sufficiently large 
K. 

Distributed Implementation: The asymptotically optimal 
policy can be implemented in a distributed fashion. Assume 
that all the nodes are time synchronized]^ Suppose that the 
packet is generated at the source at time (we assumed = 
for the purpose of analysis). Given the system parameters 
M, N, a,NQ,X and 7, the source first extracts X, Y, X^Yq, A 
and r as in (|6]l. Then, it calculates r* (see ([T3]l), and stores 
to + T* as a header in the packet. 

The packet is immediately copied to Nq relays, perhaps by 
means of a broadcast from an infrastructure "base station". 
When an infected node meets a susceptible relay, it compares 
to + T* with the current time. The susceptible relay is not 
copied to if the current time exceeds to + t* . However, all 
the infected nodes continue to carry the packet, and to copy 
to susceptible destinations as and when they meet. 

Remarks 4.3: Consider a scenario, where the interest is 
in copying packet to only a fraction a of the destinations. 
Observe that for every e > 0, 

TO(r) 



lim I 



M 



> e 



0. 



Thus, in large networks, copying to destinations can also be 
stopped at time r (see ( [T2j l) while ensuring that with large 
probability the fraction of infected destinations is close to a. 
Consequently, all the relays can delete the packet and free 
their memory at r. This helps when packets are large and 
relay (cache) memory is limited. 

V. Optimal Two-Hop Forwarding 
Instead of epidemic relaying one can consider two-hop 
relaying ifTTl . Here, the A^o source nodes can copy the packet 

""In our case these two indeed match. See Appendix [d| for a proof. 

^In practice, due to variations in the clock frequency, the clocks at 
different nodes will drift from each other. But the time differences are 
negligible compared to the delays caused by intermittent connectivity in the 
network. Moreover, when an infected node meets a susceptible node, clock 
synchronization can be performed before the packet is copied. 
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to any of the N — Nq relays or M destinations. The infected 
destinations can also copy the packet to any of the susceptible 
relays or destinations. However, the relays are allowed to 
transmit the packet only to the destinations. Here also a similar 
optimization problem as in Section II-A arises. 



Now, the decision epochs tk,k — 1,2, .. . are the meeting 
epochs of the infected nodes (sources, relays or destinations) 
with the susceptible destinations and the meeting epochs of the 
sources or infected destinations with the susceptible relays. We 
can formulate an MDP with state 

Sk ■= {mk,nk, efe). 

at instant tk where mk,nk and Ck are as defined in Sec- 
tion |III-A| The state space is [Ma - 1] x [TVq : N]x S. The 
control space is e {0, 1}, where 1 is for copy and is for 
do not copy. We also get a transition structure identical to that 
in Section IIII-AI 

For a state action pair {sk,Uk) the expected single stage 
cost is given by 

g{sk,uk) = lUk +E{4+ii{„i^^j<ji/^}} 

' '^Uk if Sk is such that ruk > 
^ if Sk — {Ma — 1, n, d) and Uk — 1 
jUk + Cd{sk, Uk) otherwise, 

where 

Cd{sk,Uk) = 

1 

{{iTik +nk + Uk){M -nik- Ukl{s^=d})>^ 

+ {rrik + Ukl{sk=d} + Nq){N -nk~ Ukl{s^=r})>^) 

is the mean time until the next decision epoch. As before, the 
quantity 711^ accounts for the transmission energy. 

Let u* : [Ma - 1] x [Nq : N] x £ ^ U he a stationary 
optimal policy. As in Section [TlI-B[ the optimal policy satisfies 
u*{m,n,d) = 1 for all (m, n) G [M - 1] x [Nq : N], and 
u*{m,n,r) = for all {171,11) G [M^ : M - 1] x [iVo : N]. 
Thus, we focus on a reduced state space [Ma — 1] x [iVo : 
N] X {r}. As before, we look for the one step look ahead 
policy which turns out to be the same as that for epidemic 



relaying. Finally, Theorem 3.1 holds for two-hop relaying as 



well (see the proof in Appendix |A]|. 

Next, we turn to the asymptotically optimal control for two- 
hop relaying. The following are the conditional expected drift 
rates. For {in{t),n{t)) G [Ma - 1] x [Nq : N], 

<m{m{t)[{vi{t),n{t))) 



dt 

<m{n{t)[{m.{t),n{t))) 
dt 



\{in{t) + n{t)){M - m{t)), 
\{in{t) + No){N - n{t)) 



1 



{$(m(i),n(t))>0}- 



0.8 



0.6 



>-0.4 



0.2 



0.05 0.1 0.15 0.2 

X 

Fig. 4. An illustration of the epidemic and two hop trajectories. The plots 
also show the graph of '<f>{x, y) = 0'. 



rates in terms of {x^ {t),y^ {t)) e [Q,Xa] x \Yq,Y] are 
Anx''{mx^it),y^{t))) 







-- 


epidemic 
two-hop 



























At 



K{x^{t)+y^{t)){X~x^{t)), 



dE(y^(t)|(x^(t),y^(t))) 



= /f(x^(i),y-W) 



Ki 



At 



:= A{x^{t) + Yo){Y - ?/^(i))l{^K(^K(t)^^/c(t))>o}, 

Now, x{t),y{t) are defined as functions satisfying x(0) = 
0,2/(0) = Fo and for t > 0, 

Ax{t) 



At 
dy{t) 
At 



h{x{t),y{t)) 
f2{x{t),y{t)) 



K{x{t)+y{t)){X-x{t)), 
K{x{t) + Y,){Y-y{t)) 



1 



Wx(t),y{t))>Ci} 



The analysis in Section llV] applies to two-hop relaying as 



well. In particular. Theorems 4.1 and 4.2 hold. However, for 



We employ the same scaling and notations as in (|6]l. The drift 



the identical system parameters (M, N, a, X and 7) and initial 
state (A^'o), the value of the time-threshold r* will be larger 
on account of the slower rates of infection of relays and 
destinations. 

We illustrate the comparison between epidemic and two- 
hop relaying using an example. Let X = 0.2, Y = 0.8, a = 
0.8, Yo = 0.2, A = 0.05 and T = 50. In Figure ^ we plot 
the graph of "0(a;, y) = 0", and also the 'y versus x' tra- 
jectories corresponding to epidemic and two-hop relayings. In 
Figure [5] we plot the trajectories of {x{t), y{t)) corresponding 
to epidemic and two-hop relayings. As anticipated, the value 
of the time-threshold r* is larger for two-hop relaying than 
epidemic relaying. Moreover, the number of transmissions is 
less while the deliverly delay is more under the controlled 
two-hop relaying. 

VI. Numerical Results 

We now show some numerical results to demonstrate the 
good performance of the deterministic control in epidemic 
forwarding in a DTN with multiple destinations. Let X — 
0.2, r = 0.8, a = 0.8, Yo = 0.2 and 7 = 0.5. We vary 
A from 0.00005 to 0.05 and use K = 50, 100 and 200. In 
Figure |6j we plot the total number of copies to relays and 
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0.2 



-epidemic 
two-hop 



20 



t 



40 



60 



10 



10'^ 



T3 
>, 

10' 

T3 

^ 10 



10 





X deterministic policy 
^optimal policy 


■V, K 

%s4 


= 50 

K = 100 
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Fig. 5. The top and bottom sub-plots respectively show the fractions of 
infected destinations and relays as a function of time. The marker 'X' indicates 
the states at which copying to relays is stopped, and 'O' indicates the states 
at which a fraction of destinations have been copied. 



Fig. 6. The top and bottom sub-plots, respectively, show the total number 
of copies to relays and the delivery delays con'esponding to both the optimal 
and the deterministic policies. 



the delivery delays corresponding to both the optimal and 
the asymptotically optimal deterministic policies. Evidently, 
the deterministic policy performs close to the optimal policy 
on both the fronts. We observe that, for a fixed K, both the 
mean delivery delay and the mean number of copies to relays 
decrease as A increases. We also observe that, for a fixed A, 
the mean delivery delay decreases as the network size grows. 
Finally, for smaller values of A, the mean number of copies to 
relays increases with the network size, and for larger values 
of A, the opposite happens. 

VII. Conclusion 

We studied the epidemic forwarding in DTNs, formulated 
the problem as a controlled continuous time Markov chain, 
and obtained the optimal policy (Theorem 3.1 1. We then 



developed an ordinary differential equation approximation for 
the optimally controlled Markov chain, under a natural scaling. 



as the population of nodes increases to oo (Theorem 4.1 



This o.d.e. approximation yielded a forwarding policy that 
does not require global state information (and, hence, is 



implementable), and is asymptotically optimal (Theorem 4.2 1. 

The optimal forwarding problem can also be addressed 
following the result of Gast et al. |14|. They study a gen- 
eral discrete time Markov decision process (MDP) ifTSl . 
However, they do not solve the finite problem citing the 
difficulties associated with obtaining the asymptotics of the 



optimally controlled process (see lfT4l Section 3.3]). Instead, 
they consider the fluid limit of the MDP, and analyze optimal 
control over the deterministic limiting problem. They then 
show that the optimal reward of the MDP converges to the 
optimal reward of its mean field approximation, given by the 
solution to a Hamilton-Jacobi-Bellman (HJB) equation L18l 
Section 3.2]. On the other hand, our approach is more direct. 
We have a continuous time controlled Markov chain at our 
disposal We explicitly characterize the optimal policy for the 
finite (complete information) problem, and prove convergence 
of the optimally controlled Markov chain to a fluid limit. An 
asymptotically optimal deterministic control is then suggested 
by the limiting deterministic dynamics, and does not require 
solving HJB equations. Our notion of asymptotic optimality 
is also stronger in the sense that we apply both the optimal 
policy and the deterministic policy to the finite problem, and 
show that the corresponding costs converge. 

There are several directions in which this work can be 
extended. In the same DTN framework, there could be a 
deadline on the delivery time of the packet (or message); the 
goal of the optimal control could be to maximize the fraction 
of destinations that receive the packet before the deadline 
subject to an energy constraint. Our work in this paper assumes 
that network parameters such as A/, N, A etc., are known; it 
will be important to address the adaptive control problem when 
these parameters are unknown. 
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Appendix A 
Proof of TheoremI3.1I 



We first prove that for the optimal poHcy it is sufficient to 
consider two actions 1 (i.e., copy now) and stop (i.e., do not 
copy now and never copy again). More precisely, under the 
optimal policy, if a susceptible relay that is met is not copied, 
then no susceptible relay is copied in the future as well. Let 
us fix a A^'o < 7T, < iV — 1. Let m* be the maximum j such 
that u*{j,n,r) ~ l|^ We show that u*{j,n,r) = 1 for all 
< j < m*; see Figure [2] for an illustration of this fact. The 
proof is via induction. 

Proposition A.l: If u* [j, n^r) — \ for all m+1 < J < m* , 
then u* (to, n, r) ~ 1. 
Proof: Define 



-(/^(to, n) 
9o{m,n) 
and 01 (to, n) 



= Josim,n,r) - J{m,n,r), 

= Jos{m, n, r) - A{[m, n, r), 0), 

= Jis{m,n,r) - A{{m,n,r), 1). 



Both the action sequences that give rise to the two cost terms in 
the definition of 6'o(m, n), do not copy to the susceptible relay 
that was just met. Let j be the number of infected destinations 
at the next decision epoch when a susceptible relay is met; j 
can be m,m + All interim decision epochs must 

be meetings with susceptible destinations, and both policies 
copy at these meetings. Hence, both policies incur the same 
cost until this epoch, and differ by ip{j,n) in the costs to 
go (from this epoch onwards). Averaging the difference over 
j, and noting that tplj, n) — for j > Ma — 1, we gej^ 

A/o-i /i-i \ 
6'o(m,n)= [Y[pi,n{d')jPj,n{r)i^U:n). (14) 

j—m \l—m / 

Since A{{m, n, r), 0) > J{m, n, r), it follows that n) > 
Oo{m, n), and so 

j—m \l—7n / 
= Pm,n(?')'0(™,'^) 

i-1 \ 



'Prn,n{d) 



A/c-1 

E 

j—m-\-l \l — m-\-l 



Yl Pl,nid) Pj.n{'r)'^{j^'^) 



which implies upon rearrangement 

A/c-l / i-1 \ 

ip{m,n)> ^ I ]J P/,„(d) j Pi,n(r)V'(j,"-) (15) 

j—7n+l \l—rn+l / 

Next, we establish the following lemma. 

Lemma A.l: 9i{m, n) > 9i{m + l,n). 

Proof: Note that both the action sequences that lead to 
the two cost terms in the definition of 9i{m,n) copy at state 
(to, n, r). Subsequently, both incur equal costs until a decision 

'Note that, for a given n, m* could be 0, in that case we do not copy to 
any more relays. 

*We use the standard convention that a product over an empty index set is 
1, which happens when j = m. 



epoch when an infected node meets a susceptible relay. Also, 
at any such state {j,n + l,r), j > m, the costs to go differ 
hy ip{j,n + 1). Hence, 

6'i(m,n)= ^ iY[pi.n+iid)\p-i,n+i{r)ip{j,n + l) 

j—7n \l—7n / 

= Pm,n+i{r)4'im, n + 1) + Pm,n+iid)di{m + 1, n) 

where 

A/„-l / 3-1 \ 

9i{m+l,n)^ E ( n Pi.n+iid)\ pj^n+i{r)Tp{j,n+l) 

j—7n+l V/— 'm+1 / 

Thus it suffices to show that 

^(m, 71 + 1) > 9i{m + 1, n). 

which is same as ([Tsj with n replaced by n + 1. ■ 

Next, observe that for all m < j < to* , 

V'(j» = Jos{j,n,r) ~mm{A{{j,n,r),0),A{{j,n,r),l)} 
= max{6lo(j»,$(j» + 6li(j»}. (16) 

Moreover, from the induction hypothesis, the optimal policy 
copies at states (j, n, r) for all to + 1 < j < m* . Hence, for 

TO + 1 < j < TO* , 

■ip{j,n) = $(j,n) + 9i{j,n). 

Finally, V'(j, n) = for all to* < j < Ma — 1 as the optimal 
policy does not copy in these states. Hence, from ( [T4] i, 

6*0 (to, n) 

= Pm,nir) max{6'o(TO,n), $(m, n) + 9i{m, n)} +Pm,n(rf) 

(r)($(j, n) + 0i(j,n)) 

j—m+l \l—va+l / 

< Pm,n{^) max {6Io(to, n), $(to, n) + 6*1 (m, n)} + p,n,nid) 
X {<^{m,n) + 9i{m,n)) ^ I J| p;,„(d) j pj,„(r) 

j=m+l \l=m+l ) 

< Pm.nir) max{6'o(TO,n),<i>(TO, n) + 6i{m,n)} 
+ Pm.n(rf) n) +6*1 (to, n)) 

= max {pm,nir)9o{'rn, n) + p,„,„(d) (<I>(to, n) +9iim,n)), 
^{m,n) + 9i{m,n)} , (17) 

where the first (strict) inequality holds because ^{m,n) is 
strictly decreasing (see Q) and 9i{m,n) is decreasing (see 
Lemma |A.l| i in to for fixed n. The second inequality follows 
because the summation term is a probability which is less than 
1. Now suppose that 9Q{m,n) > $(m, n) + 9i{m,n). Then 

max{p„,„(r)6'o(TO, n) + p„^„(d) ($(to, n) +9i{m,n)), 
$(to, n) + di{m, n)} 
= Pm.nir)9o{m,n) + p„,„((i) ($(to, n) +9i{m,n)) 
< 9Q(m, n) 

which contradicts ( [T7] i. Thus, we conclude that 

9o{m, n) < $(to, n) + 9i{m, n). 
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This further implies that ilj{m,n) = $(m,n) + di{m,n) 
(see ([T6|), and so that u* {in,n,r) = 1. ■ 

We now return to the proof of Theorem |3.1| We show that 
the one-step look ahead policy is optimal for the resulting 
stopping problem. To see this, observe that $(m, n) is de- 
creasing in m for a given n and also decreasing in n for a 
given m. Thus, if {m,n,r) G Ss, i.e, $(TO,n) < (see ([3])), 
and the susceptible relay that is met is copied, the next 
state {m,n + l,r) also belongs to the stopping set Ss- In 
other words, Ss is also an absorbing set ifTSl Section 3.4]). 
Consequently, the one-step look ahead policy is an optimal 
policy. 



Appendix B 
Proof of Theorem 14. II 



We start with a preliminary result and a few definitions. 

Proposition B.l: Let a < 1 and Fq > 0. Let 0^ and 
be as given in ([8| and ([T0|, respectively. Then, the functions 
(/)^(-) converge to uniformly, i.e., for every i/ > 0, there 
exists a K^, such that 

sup \(t)'^{x,y) - (t){x,y)\ < v 

for all K>K^. 

Proof: For a y e [Yq,Y], define fy : [0,Xa] M.+ as 
follows. 



1 



Clearly, the family {fy} is positive and uniformly upper 
bounded. Indeed, 

1 



Further, 



dfyiz) 



1 



1 



dz {y + z)^{X - z) \X - z y + zj' 
from which it can be seen that 

dfyiz) 



dz 



< f 



where /,nax is a suitably defined constant. So the family {fy} 
is uniformly Lipschitz. Now, for {z,y) G [0,Xq,] x [1^,^], 



1 



K{y + z){y + z+j^){X-z) 
fy{v)dv 



fy{v)dv 



< 


fy{z) 




K 






< 






J Z 


< 


f 

J max 




if2 



Uyiz) - fy{v))dv 



(18) 



where the first and the last inequalities follow from the 
definitions of fy{z) and /^ax respectively. On the other hand, 

1 ^ r r X y + Z 

{y + z){y + z+j^){X^z) ^^^'^y + z+j^ 



> fy{z) 



Y^ 



Yo 



K 



Hence 



fyiv)dv - 



K 



< I fyiv)dv 



K{y + z){y + z+j^){X-z) 
.fy{z) KYo 



< 



K 



K I + KYq 

fy{z) 



K{l + KYo 



< 



f 



(19) 



fy{v)dv 



ifyiv) - fy{z))dv + 

lO' ^ K{1 + KYo) ' 
Combining ( fTS) and (19) , 

1 

K{y + z){y + z+j.){X-z) 

^ /max I /max 

- K{l + KYo)' 

Now fix a {x,y) S A^. Setting z = j/K, and summing over 
j e [Kx : \KXa\ - 1], we get 

|0^(a;,y)-0(a;,y)| 
A 



< 



j=Kx 



1 



K(y+j,){y+i^){X-j,) 





1 


1 fy{'")dv 
if 


+ A 



< 



< 



A 



/„ 



^afra 



K{l + KYo] 

X(x fnia.x /max 



K' 
fn 



fy{v)dv 



KA 



KA (1 + KYa)A KA 

The obtained upper bound on the right-hand side is indepen- 
dent of {x,y) G A^, and vanishes as iiT — > cx). Thus, for 
every i/ > 0, there exists a K^, such that 



sup 10 {x,y) - (t)ix,y)\ < v 

{x,y)eA''^ 



for all K > K, 



In the following, to facilitate a parsimonious description, we 
use the notation z^ {t) = {x^ (t),y^ (t)), z(t) = {x{t),y{t)) 
and Z = [0,Xa] x [Fq,^]- Let us define, for a i/ e M, 

S^ ={zeZ: <j){z) > ly}, 
= inf{t > : ^ 5,}, 

and a stopping time 

= inf{t > : i S,}, 
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the time when (t) exits the limiting set Si,. Observe that 

90 _ 1 , 1 

dx 



< 



(20) 



A{x + yY{X ~x) - K(Xa + vyx 

and f^{x,y) defined in ( fTa] ) is positive and is also bounded 
away from zero. These imply that < oo with probability 
1. Similarly, Ti, < oo. The following assertion is a corollary 



of Proposition B.l 



if. 



Corollary B.l: Let K^, be as in Proposition |B.1| For K > 

(j>^{z) > for all z € S„, 
and (j)^{z) < for all z ^ 

We define the uncontrolled dynamics (i.e., the one in which 
the susceptible relays are always copied) as a Markov process 
z^{t) = {x^{t),y^{t)), t > for which z^^'(O) = ^-^(0). 
Let z{t) = {x{t),y{t)), i > be the corresponding limiting 
deterministic dynamics. Formally, z(0) = z(0), and for t > 0, 

dx(t) , , , , 

dm 
dt 

The quantities on the right-hand side of the above equations 
are at most A, and so 

dz 



= A{x{t) 



ymx-x{t)), 

-ymY~m)- 



dt 



< V2A. 



Also observe that the processes z (t) and z{t) satisfy the 
hypotheses of Darling ifTTI (see Section |IV-A[ i, and thus 
convergence of z^(t) to z{t) follows. 

We also define a Markov process z^ {t) = {x^ {t),y^ {t)), 
t> for which 2^(t^) — z^ (t,,) and 

dni-mi^^itivHt)) ^ ^ -.-(^^^^^ _ 

dE(y^(i)|(x^(0,2/^W) 



dt 



= 



TABLE I 

Variables and their description 



variables 


description 




controlled dynamics with discontinuity at 


z{t) 


z^(i)'s fluid limit with discontinuity at r* 




instant when z^ {t) exits 


Tu 


instant when z{t) exits Si, 


z^it) 


uncontrolled dynamics with no discontinuity 


-z{t) 


z^(i)'s fluid limit with no discontinuity 


~z^(S) 


identical to z^{t) until Ti, at which copying to 




relays is stopped 


m 


z^(t)'s fluid limit with discontinuity at 


' ~v 


instant when z^ {t) exits S-^ 




instant when z{t) exits S^u 



at a rate bounded away from (see 20 1, z{t) must exit S-i, 



within a short additional duration. Thus, we have that 

T-y ~Ty <bv 

for a suitably chosen 6 < oo. 

To aid the reader, we summarize the variables used in 
Table |l] We also illustrate sample trajectories of a controlled 
CTMC and the corresponding ODE via an example (Figure |7|l. 
We choose M = 40, A^ = 160, a = 0.8, A^o = 40, A = 
0.00025 and 7 = 0.25. We plot the graphs of '(j){x,y) = v' 
and y) = —v' for v = 0.2. We also show the trajectories 
''y^ vs x^", "y vs x", "y vs i" and the epochs t^, t^^ and 



We prove the assertion in Theorem 4.1 in three steps: 
(a) over [0,Ti.], (b) over [r,,,f_^] and (c) over {t^u,t\. 
However, we also need the following lemmas in our proof.. 

Lemma B.l: For every e > 0, there exists a such that for 



In other words, z^{t) is the process in which relays are 
not copied from onwards. Similarly, we define z{t) — 
{x{t),y{t)), t > Ty as the solution of the corresponding 
differential equations. In other words, z(t^) = z{Ty), and for 

t > Ty, 



diit) 
dt 

dt 

We define 



Since 



inf{i > Ty : z^{t) i 
inf{i > Ty : z{t) i S-i,}. 



KYo{X-X^) < ^ < A, 

the lower bound implies that there is a strictly positive increase 
in X after time t^. Since (^{x,y) decreases with increasing x 



^(|)(x,y)=-V 








(|)(x,y)=V -'P-^z^^^ 


/ 




X Y vs X 


Jj*'' K K 

^^T' y VS X 








Y vs X 









. 05 



0.1 



. 15 



Fig. 7. An illustration of the trajectories of the controlled CTMC and the 
corresponding ODE, and the associated variables. 
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all < > 0, < u < fj, 

P {pJ^{t + u)- z^(i)|| > e) = 0{K-^). 
Proof: Observe that 

< ||z^(t) - z{t)\\ + + u)~ z{t + u)\\ 

+ ||z(t) - z{t + u)\\ 

< \\z^{t) - z{t)\\ + \\z^{t + u) ~ z{t + + V2Au 
Hence, for alU > 0, u > 0, 

P + u)- > V2Au + I) 

< P - z{t)\\ + W^'^it + u)- z{t + u)\\> 

< P ( sup \\z^{s) - z{s)\\ > ^ 

\t<s<t+u 

= 0{K-') 



rem 2.8] we have, for all e, (5 > 0, 



where the last equality follows from 1 11 , Theorem 2.8]. Setting 
for alU > 0, < u < 



~ 2\/2A' 

P(||z^(t + M)-z^"(t)|| >e) 
< P + u)- z^(i)|| > V2Ku + 

= 0{K-'). 

■ 

Lemma B.2: Suppose it is a fixed time and is a random 
time that satisfies P (|u — > 5) = 0{K^^) for every 5 > 
0. Then, for every e > 0, 

P (||z^(m) - z^{u A > e) = 0{K-^) 

Proof: Fix a (5 > 0. Then, 

F{\\z^{u)-z^{uAu^)\\ >e) 
= ¥{u -u'^ > 6) 

V{\\z^{u)-z^{uAu^)\\ >e\u~u^ >S) 
+ ¥{u -u^ <d) 

V{\\z^{u)-z^{uAu^)\\ >e\u-u^ < S) 

< 0{K-^) + P {p.^{u) - z^{u A u^)\\ > e\u - u'^ < 5) 

< 0{K-^) + P - z'^iu - 5)\\ > e\u ~ < 5) 

where the last inequality holds because z^ {t) is a monotone 
increasing function. Setting 5 — (see Lemma |B.1[), 



V{\\z^{u)-z^{uAu^)\\ >e) 

< 0{K-^) + P (liz^(u) - z^{u - f,)\\ > e\u < f,) 
= 0{K-^)+¥{\\z''{u)-z''{u-f,)\\ >e) 

< 0{K-^)+0{K-^) 

where the last inequality follows from Lemma |B.1| ■ 



sup ||z"(tAT^)-z(t)||>e =0(is:-^) (21a) 

0<t<T„ / 

(21b) 



and P(|t,^-t,| >5) = 0{K-'). 
Since, for all t > 0, 

II^^W-^WII < \\z''{tAT^yz{t)\\ + \\z''{t)-z''{tAT^)\\, 
we obtain 

sup ||z-^(t) -z(<)|| < sup \\z'^{tAT^)~z{t)\\ 

0<t<T„ 0<t<T„ 

+ sup \\z'<{t)-z^{tAT^)\\. 

0<*<T„ 

If the left side is larger than e, at least one of the two terms on 
the right side is larger than e/2, and so by the union bound, 
we get 

P( sup \\z^ (t) ~ z{t)\\ >e 
\a<t<T^ 



<PI sup \\z'^{tAT^)-z{t)\\>- 
\o<t<T„ 2 



sup ||z^(i) -z^(tAr; 

0<t<T„ 



> 



(22) 



< OiK-') + ¥ (||z^(r,) - z^(r, A r,^)|| > 
where the first term i n the last inequality follows from ( |21a| l. 



B.l 



Also, from corollary 
i.e., the process z^ (t) follows uncontrolled dynamics until 



for K > K^, 
allows uncont] 
T^. Thus, for K > K^, z^{t^) = z^ir^f) and 

||z^(r,) - z^(r, A t^)\\ < ||z-^(r,) - z"" (r, A t^')\\ 

sample path wise. The inequality is an equality if < t^; 
both sides equal in this case. Otherwise, it is an inequality 
because the possible change in dynamics of z^{t) after 
makes it increase (in both its components) at a slower pace 
than the uncontrolled z^(t). Thus 

p{\\z^{r^)-z^{r^Ar^)\\>^) 

<p(||f^(r.)-z^(r.Ar,^)||>0 

where the last inequality follows from ( |21b[ ) and Lemma B.2 



Using this in (|22j) we get 

Pf sup ||z-^(t) > e) <0{K-^)+0{K-^) 

\0<t<r„ / 



Following is the proof of Theorem 4. 1 



(b) Now we prove the convergence of z^ (t) to z{t) over 
[t'i/, 7^-1/]- Observe that, for t e [r,y,f^^], 

||z^(t)-z(t)|| 

converges to z{t) in the region S^. Following lHH Theo- < \\z^{t„) - z{t^)\\ + \\z^it) - z^{t^)\\ + \\z{t) - z{t^)\\. 



(a) First, we prove the convergence of z ^ (t) to z{t) over 
[OjTi/]. Fix a ly > 0. Then Corollary B.l implies that z^ (t) 
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Hence, 

sup \\z^ (t) - z{t)\\ 

T„<t<f_„ 

< ||z^(t,)-2(t,)||+ sup ||z^(i)-z^(r,)|| 

T„<t<f_„ 

+ sup \\z{t) - z{Tiy)\\ 

T„<t<f_„ 

= ||z^(r,.)-z(r.)|| + ||z^(f_.)-z^(r.)|| 
+ ||z(f_,)-z(r,)|| 

< ||z^(t,) - z(t,)|| + 11^^ (f-.) - z^(r,)|| + V2A&i. 

where the equality follows because the z^t) and z{t) are 
nondecreasing. The last inequality holds because ||dz/di|| < 
||dz/dt|| < \/2A and — < foi^- Moreover, 

P sup II z (t) - z{t)\\ > ^/2Khu+ - 

<p(||z^(r.)-z(r.)||>^) 

+ P(||z^(f_.)-z^(r.)||>^) 

= 0(i^-i)+p(||z^(f_.)-z^M||>^) 

where the equality follows from the result of part (a). We 
now redefine the Markov process z^ {t) — {x^ (t), (t)) for 
t > Ty, to be the uncontrolled dynamics with initial condition 
z^ {tj^) = z^ (t^). Again, it can be easily observed that 



||z^(f_,)-Z^(T,)|| < ||z^(f_,)-Z^^(T, 



Thus 



sup 

, T„<t<f_, 



\\z^ (t) - z{t)\\ > V2Abiy + 



<0(if-i)+p(||z^(f_.)-z^(r,)||>^) 
<0(if-i)+p(||z^(T, + 6;.)-z^(r,)|| > 
Set I' = min{ ^^^^ ,fj_}, and apply Lemma 



B.l 



to get 



sup \\z^{t) - z{t)\\ > e 

r„<t<f_„ 



< 



sup 

, r„<t<f_ 



|z-^(t) -z(i)|| > V2Abiy 



< 0{K-^) + 0{K-^) 
= 0{K-'). 

(c) Finally, we prove the convergence of z^{t) to z(t) 
over [f_^,T]. Reconsider the process z^{t),t > and 
the associated function z{t). Recall that, for any > 0, 
z^{t) and z{t) exit S-i, at f!^^ and f_,y respectively. Clearly, 

< ^-v\ say - = Also, using HT] 
Theorem 2.8], 

> f^ = 0{K-^) 



I.e.. 



-1//2 



Furthermore, we have that t^^i2 < sample path wise. 

The inequality holds because z^ [t] may continue to increase 
(in both its components) at a higher pace than z^ [t] even after 
T,j. Thus 

' K 



implying that the probability that z^ (t) has changed its 
dynamics by approaches 1 as ii' approaches oo. In 

these realizations, the dynamics of z^ (t) and z{t) match for 
t > f_y. We restrict ourselves to only these realizations. We 
also have from part (b) that, for every e > 0, 

P(||z^(f_„)-2(f_,)|| >e) =0(if-i) 

Once more using IfTTl Theorem 2.8], for any e,S > 



sup ||z-^(t) - z{t)\\ > e 

-„<t<r , 



and P {\t^ -t\> 5) = 0{K~^). 



Appendix C 
Proof of TheoremI4.2I 



For the optimal policy u*, the total expected cost 

K^ATd + i£c} = Ef.{T^ + r{x + 2/^(t^))} 

since Td = by definition (see ([TT|); we use the subscript u* 
to show dependence of the probability law on the underlying 
policy. Under the deterministic policy copying to relays 
is stopped at the deterministic time instant r* < r, implying 
y^{T*) = y^ij). Thus, the total expected cost 

KA% + l£c) = Ef»{r^ + nX + y^(r))}. 

Also observe that for {x^ (t) , y^ (t)) under u°°, the corre- 
sponding fluid limits are the same deterministic dynamics 



{x{t),y{t)) defined in Section IV-A (i.e., solutions of (|9a| 



(|9b|)). {x^ {t),y^ {t)) and {x{t),y{t)) satisfy the hypotheses 
assumed in Darling (lT\ over the intervals [0, t*] and [r*, oo). 
Thus ifTTl Theorem 2.8] applies, and we conclude]^ 



lim Pf^ sup Ux^{t),y^{t))-ix{t),ym\>^- 

K^oo \o<t<T 



lim P^^ (I 



-\>6)= 0. 



Furthermore, it can be easily shown that under both the 
controls u* and u°°, the delivery delays have second 
moments that are bounded uniformly over all K. To see this, 
consider a policy that never copies to relays. Clearly, 

E^.ir^r <Efo(r^)^ 
EUr^^r <Efo(r^)^ 
for each K. Then is suffices to show that 



supE^o(T^)' <oo. 

K 



'Applying II II Theorem 



[0,. 



is a necessary condition to apply |11 Theorem 2.8] over [r*,oo). 



(23) 



yields 
which 
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Note that 



K 



E ' 

m=0 



where 5m is the time duration for which m{t) = m; 6^,112 = 
0, 1, . . . are independent, and 6^ is exponentially distributed 
with mean ^ttt — . .TK^,,rK n under policy Thus 



E 



-m) 

= 2 A^(m + 0(M^-m) 



1 



M, 



a: 



Ayo(^-^a) 

<oo 



Similarly, 



VarfoT^ 



E 



1 



< 



-'o (A^(m + iVo^)(M^-m))^ 
M5 



~ Kk^Yi{X-X^Y 
^0 

as if — > 00. These results together imply ( |23| ). 

Following fig] Remark 9.5.1], under both u* and v! 
are uniformly integrable. Since, r^, under both u* and u°°, 
converge to r in probability and hence in distribution, lfT9l 
Theorem 9.5.1] yields 



00 



lim Ef.T^ = lim EfooT^ 

/C— 4-00 iv— >-oo 



(24) 



Next, it is easy to show that under the control u* , {t^) 
converges to y(r) in probability. To see this, observe that 

|y^(r^)-y(r)| < \y'< {r'')-y'< {r)\ + \y^ {T)-y{r)\. (25) 

From Theorem 



4.1 



y^ (t) and converge to y(T) and t 
respectively, in probability. The latter result, along with the 
arguments similar to those in the proof of Lemma |B]2j implies 
that 

¥{\y^iT^)~y^{T)\>e)=0{K-') 
for every e > 0. Using these facts in ( |25| ), we conclude that 

P(|y^(r^)-2;(T)|>e)=0(A-i). 

for every e > 0. Since y^(r^) is bounded, and hence 
uniformly integrable, lfT9l Theorem 9.5.1] implies that 



lim Ef.y^(r^) 



(26) 



Similarly, under the control u°° also, y^ {t) is bounded, and 
hence is uniformly integrable. It also converges to y(r) in 



probability. Once more using lfT9l Theorem 9.5.1], we get 

?K „,K 



lim K^y^ir)^yiT). 

if — >oo 



Combining ( |26] l and ( p7] i 



lim Ef.y^(r^) 

if — >oo 



lim Ef^y^(r). 

A— >-oc 



Finally, combining (|24|) and (|28l, we get that 



(27) 



(28) 



lim E^. {Td + l£c}= lim E^^^ {Td + -f£c} = r + Ty{T*). 

if— foo if— foo 

Appendix D 

The Hamiltonian Formulation and The Solution 

In this section we consider the limiting deterministic (fluid) 
system and study its optimal control. The limiting controlled 
system is: a;(0) = 0, y(0) = Yq, and for t > 0, 

dx{t) 



dt 
dt 



A{x{t)+yit)){X -x{t)), (29a) 
A{x{t)+yit)){Y ~y{t))u{t) (29b) 



where u{t) e [0, 1] is the control at time t. Our objective is 
to minimize 

.dt (30) 



ry(r) + T = ry(T) + / 1. 

Jo 



where T is the terminal time when x{T) = X^, dependence 
of T on the underlying control is understood, and is not shown 
exphcitly. 

Theorem D.l: The optimal policy for the deterministic sys- 
tem (|29ali-(|29bl) with cost dSOll is 



U*(i) = l[0,..](t) 

with T* as in ( [T3| ). Furthermore, the optimal cost is T+ry(T*) 
with T as in ( [T2] i. 

Proof: Following ifTSl Section 3.3.1], we define the 
Hamiltonian for the system 

H{x,y,u,pi,p2) 

= 1 + piA{X - x) {x + y)+ P2 A(r -y){x + y)u 

= I + K{x + y)[pi{X - x) + p2{Y - y)u] (31) 

where pi : — > M, i = 1,2 are the cojoint functions 
associated with x{t) and y{t) respectively. Let u*{t),t > 0, 
be an optimal control trajectory. Let T* be the correspond- 
ing terminal time, and let {x* (t) , y* (t)) , t e [0,T*] be the 
corresponding state trajectory. 

a) Adjoint equations: By |18, Section 3.3.1, Proposi- 
tion 3.1], the functions Pi{t) are solutions of the following 
adjoint equations: 

- A[piit){X - 2x*{t) - y*{t))+p2{t){Y - y*{t))u*{t)], 

(32) 

dp2{t) d 



dt 



dy 



H{x*,y,u*,pi,p2) 



y=y* 



- A[pi{t){X - x*{t))+p2{t){Y - x*{t) - 2y*{t))u*{t)]. 

(33) 
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b) Boundary condition: Observe that the terminal cost is 
Ty*{T*). Thus, by Qll Section 3.3.1, Proposition 3.1], 



P2{T*)= -^(ry) 



= r. 



(34) 



y=V'(T') 



c) Minimum principle: Moreover, the optimal control u* 
satisfies 

u*{t) = arg min H{x*{t),y*{t),u,pi{t),p2{t)) 
tie [0,1] 



for all t £ [0, T*]. From ([31), it is immediate that the optimal 
policy is a bang-bang policy. 

1, if P2(t) < 

0, if > 



In particular, our observation ([34| implies that u*{T*) ~ 0. 

d) Free terminal time condition: Since the terminal time 
is free, we also have from ifTSl Section 3.4.3] that 



H{x*{t),y*{t),u*{t),pi{t),p2it))^0 

for all t e [0,r*]. In particular, equality at t - 
plies (see ( (3T] l) 



T* im- 



l + A(X„+y(r*))[pi(T*)(X 
Since X — Xa > 0, we must have 
Pi(T*) <0. 



= 0. 



(36) 



We will find this observation useful later. 

Our characterization of the optimal control consists of two 
steps. First we show that the optimal control trajectory is of 
threshold type, i.e., 

1, if t e [o,t*] 

0, if t G {t*,T*]. 



u*{t) 



(37) 



This is done in the next subsection. In the subsequent subsec- 
tion, we obtain the threshold t* . 

A. Optimal control is of threshold type 

We show that P2{t) is negative for t E [0,t*] and strictly 
positive for t € {t*,T*] for some t* > 0. It then follows 
from 135) that is as in Recall in (|32]). We 

consider two scenarios. 

1) Case 1: Let X - 2Xa - y*{T*) > 0. Since x*{t) and 
y* (t) both are non-decreasing in t, we have 

X ~ 2x*{t) ~ y*{t) > for all t e [0,T*]. 

Moreover, from ( |35| ), 

P2it)u*{t) < for all t e [0,T*] 

with equality at t = T*. Thus, from ([32|, 

dpi{t) 



dt 



> 



for all t e [t',T*] at which pi{t) < 0. But, using the 
observation pi{T*) < (see (|36]l), it immediately follows 
that 

>0 for alii e [0,r*], 

at 



and so, pi{t) < for all t e [0,T*]. Now, from Q 
dp2{t) 



dt 



> 



for all t e [0,r*] at which p2{t) > 0. Again, using the 
observation p2 (T* ) = T > (see ([34|), it follows that either 
P2it) > for all t e [0,T*], or there exists a t* e [0,T*] 
such that p2 {t* ) = 0, and 



< 0, if i e [Q,t*) 
> 0, ifte {t*,T* 



(35) t = 



2) Case 2: Let X - 2Xa - y*{T*) < 0. Observe that 
X — 2x* (t) — y* (t) is decreasing in t. Thus, tracing back from 
t = T*, there exists a ti such that X - 2x*{ti) - = 0; 

we set ii = if X - 2a;* (t) - y*{t) < for all t e [0,7*]. 
Clearly, X ~ 2x*{t) - y*(i) < for all t £ [ti,T*]. 

We claim that pi{t) < for all t E [ti,T*]. Suppose not, 
i.e., there exists a t2 E [ti,T*] such that pi{t2) > 0. Then, 
from 

dpi{t) 



dt 



> for all t e [<2,T*], 



and so, pi{t) increases with t in this interval. But this 
contradicts the assertion in ( |36] l that pi{T*) < 0. Hence the 
claim holds. 

Now, X - 2x*{ti) - y*{ti) = 0, and pi(ii) < 0. An 
argument similar to that in Case 1 yields that 

dpi(t) 



dt 



> for all t £ [0,ti], 



and so, pi{t) < for all t e [0,r*]; recall that it is readily 
seen that pi{t) < for all t G [ti,T*]. Consequently, as in 
Case 1, either P2it) > for all t G [0,r*], or there exists a 
t* e [0,T*] such that p2{t*) = 0, and 



< 0, if i e [Q,t*) 
> 0, ifte {t*,T* 



To summarize, in both the cases there exits a t* G [0,T*] 
such that 



P2{t) 



< 0, ifte [0,t*) 
>0, if {t*,T* 



B. Optimum Threshold 

We now characterize the optimal threshold t* . Consider a 
threshold policy 



u{t) 



1, if t e [Q,t ] 
0, if t e {t,T]. 



Let the corresponding state trajectory be {x*'{t),y^{t))^t > 0, 
and let the terminal time be T{t). Let x := x*-{i) and y := 
y^{t) be the values at the threshold time t. Clearly, 

dx 

K{x + y){X -x). (38) 



dt 

The associated cost is 



C{1) = T{t) + Ty, 



(39) 
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arg minC(f). 



For any t > and t e (t, oo), 



and so 



r(t) = i + 



A 



{z + y)iX-z) 



Its substitution in (|39| yields 



C{t) 



dz 



{z + y){X-z) 
Using Leibniz rule of differentiation, we get 



AC{t) 
At 



= 1 + r 



At 



1 
A 



dy 
At 



+ 



Ax 



Az 

1 



dy 
dt 



-1 



At {x + y){X - x) 
Az 



(z + y^iX-z) 
where the last equality uses ( [38] l. Defining 

1 /"-^^ dz 



9{t) := r 



we get 



Note that 



dg 



> 



X-Xo, 



A J, (z + mx-z)' 
ACit) Ay 

-,%>Q,%>AY,{X-X^ 



and so g{t) is also strictly increasing in t with slope bounded 
away from 0. Thus, the optimal threshold is given by 



t* 



if .g(0) > 0, 

g^^{{)) otherwise 



which is identical to r* in ( [T3| ). ■ 
Remarks D.l: Combined with Theorem |4.2| we now have 
that the limit of the optimal cost (of the finite problem) equals 
the optimal cost of the limiting system. This does not hold in 



general (see Remark 4.2 1. 
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